Performance of Fault Tolerant Networks of Workstations

نویسنده

  • John Morris
چکیده

Functional or dataflow models of computationenable a program’s run-time system to determine which portions of it must be repeated when faults occur. Modifications were made to the run-time system of Cilk a threaded C to enable a Network of Workstations to tolerate fail-stop faults of individual processors or the network. The overheads needed to provide this fault tolerance are shown to be mainly CPU cycles and memory, with little additional network load being generated in the absence of faults. This makes it feasible to run long computations successfully on NoWs where ownership, control and distribution of the individual processors may be widely distributed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing

Networks of workstations (NOWs) offer a cost-effective platform for high-performance, long-running parallel computations. However, these computations must be able to tolerate the changing and often faulty nature of NOW environments. We present high-performance implementations of several fault-tolerant algorithms for distributed scientific computing. The fault-tolerance is based on diskless chec...

متن کامل

Fault Tolerant Matrix Operations for Networks of Workstations Using Multiple Checkpointing

Recently, an algorithm-based approach using diskless checkpointing has been developed to provide fault tolerance for high-performance matrix operations. With this approach, since fault tolerance is incorporated into the matrix operations, the matrix operations become resilient to any single processor failure or change with low overhead. In this paper, we present a technique called multiple chec...

متن کامل

Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach

One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Many researchers are using conventional techniques such as RPC, DSM, replication, causal communications and other techniques to provide parallel computing facilities on workstation ne...

متن کامل

Fault - Tolerant Clusters of Workstations with Single System Image

he computing trend is moving from clustering highend mainframes to clustering desktop computers. This trend is triggered by the widespread use of PCs, workstations, Gigabit networks, and middleware support for clustering. This paper presents new approaches to achieving fault tolerance and single system image (SSI) in a workstation cluster. A multicomputer cluster is a collection of node compute...

متن کامل

On Synchronisation in Fault-Tolerant Data and Compute Intensive Programs over a Network of Workstations

An application structured as a fault-tolerant bag of tasks adapts easily to changing resources. To be represented by a single bag of tasks, a computation must decompose into purely independent tasks. The work summarised here investigates performance of structuring approaches applicable where this ideal is not possible, partly through analysis and partly through measurements of a realistic fault...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999